“In the name of God, I, take you, to be my husband/wife, to have and to hold from this day forward, for better or worse, for richer or poorer, in sickness and in health, to love and to cherish all the days of my life. This is my solemn vow.”




So you still believe in fairy tale?

Time to wake up.

1 Introduction

In this project, we seek to find out features that may affect people’s marriage status. Are they truly loyal to their wedding vows? Or are there certain patterns or factors that may lead their marriage to end?

We choose ‘marital status’ and ‘number of times married’ as measurement of a person’s performance on marriage. By relating this to other factors, such as income, eduction level, or health status, we wish to find some patterns upon them.

We may find answers for some interesting questions, for example, does high income help people keep marriage steady? Should we listen to parents’ suggestions to break up with boyfriend who has a severe disease or disability? Is it a good idea to get married with a person who has a doctor degree?

Our goal is to find related factors that may bring negative effects to marriage, and avoid them.

2 Data Selection

2.1 Marriage Performance

We tag marriage performance by two measurement: ‘Marital Status’ and ‘Number of Times Married’.

  1. fac = 1 (Good Marriage Performance): the number of times married is one and the marital status is ‘not divorced’ now, we think this person does very well in marriage.

  2. fac = 2 (Poor Marriage Performance): the number of times married is one and the marital status is ‘divorced’ now, or the number of times is two and the marital status is ‘not divorced’ now, we think this person performs not bad in marriage.

  3. fac = 3 (Inferior Marriage Performance): the number of times married is three, or this number is two and the marital status is ‘divorced’ now, we think this person does very poor in marriage.

databad <- data_ori[(MAR == 3 | MARHT >= 2)]
datagood <- data_ori[(MAR != 3 & MARHT < 2)]

#Define a good marriage attitude
datagood[, fac := 1, ]

#Define a poor marriage attitude
databad1 <- databad[!(MARHT >= 3 | (MARHT == 2 & MAR == 3))]
databad1[, fac := 2, ]

#Define a inferior marriage attitude
databad2 <- databad[MARHT >= 3 | (MARHT == 2 & MAR == 3)]
databad2[, fac := 3, ]

data_f <- rbind(datagood, databad1, databad2)

2.2 Disability Record

  1. DIS = 1: with a disability
  2. DIS = 2: without a disability
data_dis <- data_final[!is.na(DIS), list(DIS, fac)]

2.3 Race Code

  1. RAC1P = 1: White alone
  2. RAC1P = 2: Black or African American alone
  3. PAC1P = 3: American Indian alone
  4. PAC1P = 4: Asian alone
data_RAC1P <- data_final[RAC1P==1|RAC1P==2|RAC1P==3|RAC1P==6, list(RAC1P, fac)]

2.4 Educational Attainment

  1. SCHL = 1: No schooling completed
  2. SCHL = 16: Regular high school diploma
  3. SCHL = 21: Bachelor’s degree
  4. SCHL = 22: Master’s degree
  5. SCHL = 24: Doctorate degree
data_SCHL <- data_final[SCHL==1|SCHL==16|SCHL==21|SCHL==22|SCHL==24, list(SCHL, fac)]

2.5 Wages or Salary Income Past 12 Months

data_WAGP <- data_final[!is.na(WAGP), list(WAGP, fac)]

2.6 Military Service

  1. MIL = 1: Now on active duty
  2. MIL = 2: On active duty in the past, but not now
  3. MIL = 3: Only on active duty for training in Reserves/National Guard
  4. MIL = 4: Never served in the military
data_MIL <- data_final[!is.na(MIL), list(MIL, fac)]

3 Weight Incorporation

We removed all NA and balanced weight for the sample data.

data_all_var <- data_final[!is.na(DIS) & (RAC1P==1|RAC1P==2|RAC1P==3|RAC1P==6) & (SCHL==1|SCHL==16|SCHL==21|SCHL==22|SCHL==24) & !is.na(WAGP) & !is.na(MIL), list(DIS, RAC1P, SCHL, WAGP, MIL, PWGTP, fac)]

4 Data Visualization

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: rCharts
## Loading required package: plyr

4.1 ‘For Richer or Poorer’ — Wages VS Marriage Performance

a. Boxplot

#plot
##wages: 
# 1.box-plots for wage
p <- plot_ly(ggplot2::diamonds, y = data_raw$WAGP  , color = data_raw$fac , type = "box")
p

b. Bar Chart

# 2. bar chart for wage
p <- plot_ly(
  x = c("A","B","C","D","E","F","G","H","I","J"),
  y = c(377034, 25113657,   5645546,    7056107,    944659, 383814, 4157641,    2427844,    6522414,    6172354),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("A","B","C","D","E","F","G","H","I","J"),
  y = c(114668, 9748396,    2379307,    2767572,    257943, 95312,  1315064,    696605, 2822095,    2182342),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("A","B","C","D","E","F","G","H","I","J"),
  y = c(30508,  3396854,    696597, 708733, 46991,  18961,  278326, 161293, 758216, 517508),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

4.2 ‘No Matter Who You Are’ — Races VS Marriage Performance

a. Bar Chart

# 3.bar chart for race
library(plotly)
p <- plot_ly(
  x = c("white", "black/africa Am.", "Am. Indian", "Asian"),
  y = c(48622273,4629716,204187,5344894),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("white", "black/africa Am.", "Am. Indian", "Asian"),
  y = c(430131,11367450,6906974,3135660,539089),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("white", "black/africa Am.", "Am. Indian", "Asian"),
  y = c(148481,3797101,1722793,797134,148478),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot

c. Conclusion

White people tend to have more divoice and poor marriage performance, and the same with black people. asian tend to have better marriage perfoemance while indian american may not be the same. .

4.3 ‘No Matter How You Are’ — Education Level VS Marriage Performance

a. Bar Chart

##4. barchart for education
library(plotly)
p <- plot_ly(
  x = c("No completed school", "high school", "bachelor", "master", "doctorate"),
  y = c(1372324,23810758,2180033,1725501),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("No completed school", "high school", "bachelor", "master", "doctorate"),
  y = c(430131,11367450,6906974,3135660,539089),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("No completed school", "high school", "bachelor", "master", "doctorate"),
  y = c(148481,3797101,1722793,148478),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot

c. Conclusion

People with high school degree are more likely to have a “poor” and “inferior” marriage attitude, while people with other degree are more likely to have a “good” marriage attitude.

4.4 ‘In Sickness and in Health’ — Military Status VS Marriage Performance

a. Bar Chart

##5. barchart for military
p <- plot_ly(
  x = c("No duty", "Duty in past", "Only duty for training", "Never served"),
  y = c(202241,4757340,888360,52953129),
  name = "good marriage performance",
  type = "bar")
p2 <- add_trace(
  p,
  x = c("No duty", "Duty in past", "Only duty for training", "Never served"),
  y = c(50390,2541718,409714,19377482),
  name = "poor marriage performance",
  type = "bar")
p3 <- add_trace(
  p2,
  x = c("No duty", "Duty in past", "Only duty for training", "Never served"),
  y = c(7198,1047573,147901,5411315),
  name = "inferior marriage performance",
  type = "bar")
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot

c. Conclusion

From the graphs above, we can see people who never have military history have a better performance than others.

4.5 Disability VS Marriage Performance

# 1. bar chart for disability
p <- plot_ly(
  x = c('with disability', 'without disability'),
  y = c(7924298,50876772),
  name = 'good marriage performance',
  type = 'bar'
)
p2 <- add_trace(
  p,
  x = c('with disability', 'without disability'),
  y = c(3865462,18513842),
  name = 'poor marriage performance',
  type = 'bar'
)
p3 <- add_trace(
  p2,
  x = c('with disability', 'without disability'),
  y = c(1568850, 5045137),
  name = 'inferior marriage performance',
  type = 'bar'
)
p4 <- layout(p3, barmode = "stack")
p4

b. Dependency Plot

c. Conclusion

From the graphs above, we can see people with disability are more likely to perform worse in marriage than people without disability. .

4.6 Parallel Coordinates Plots

Parallel coordinates is a visualization technique used to plot individual data elements across many dimensions. Each of the dimensions corresponds to a vertical axis and each data element is displayed as a series of connected points along the dimensions/axes. This technique can have an explicit explanation of the corresponding categorical parameters.

## Parallel coordinates plots

subdata = as.data.frame(subdata)
subdata_new = subdata[,c(1,2,3,4,5,7)]
scpcp(subdata_new, sel="data[,6]", sel.palette = "w")

4.7 Interactive Parallel Coordinates Plot

Alghouth, the previous static plot shows the relationships among those categorical variabales, this interactive parallel coordinates plot presents those relationships more explicitly.

(Click me!)

5 Conclusion

Sometimes love is blind, from the results above, we can see that income, health and education level can really influence people’s marriage. If you are not single, think more about your future and then decide whether to get married with your current couple or not. If you are single, don’t worry about your future, it is worthwhile to spend more time in looking for your soulmate.